Quasar: A Scalable Naming Language for Very Large File Collections
نویسندگان
چکیده
As storage capacities increase, managing petabytes of data becomes increasingly challenging. One reason is the POSIX file system interface, originally designed in the 1970s in the context of file collections many orders of magnitude smaller than those found in today’s petabyte-scale storage systems. We show the scalability problems of the naming language imposed by POSIX, i.e. the language to identify an individual file or a group of files. We identify common features of popular applications that manage large file collections as search, attributes, and relationships. The increasing size of file collections has already motivated file system designers to include support for these features, so highly optimized implementations can be shared across all applications. Existing approaches treat these features as add-ons to the POSIX naming language. One consequence of this lack of integration is that searches cannot be scoped to a fragment of a file system name space, which makes search hard to scale to very large file collections. We present a naming language (Quasar) that offers operators for search and view specification within file systems. Quasar supports scope limiting by subtrees and by link distance. A Quasar name expands into a collection of Quasar names that represent a connected graph. We evaluate Quasar by contrasting its use with SQL and XPath in scenarios that are typical for very large file collections.
منابع مشابه
QUASAR: Interaction with File Systems Using a Query and Naming Language
As storage capacities increase, finding and organizing data becomes increasingly challenging. Conventional approaches to organization for file systems fail to effectively provide for the needs of petascale storage, because hierarchical namespaces do not scale and must rely on ad hoc utilities. Previous solutions do not incorporate relationships as search terms, nor were they designed for today’...
متن کاملClue Tables: A Distributed, Dynamic-Binding Naming Mechanism
This paper presents a distributed, dynamic naming mechanism called clue tables for building highly scalable, highly available distributed file systems. The clue tables naming mechanism is distinctive in three aspects. First, it is designed to cope well with the hierarchical structure of the modern large-scale computer networks. Second, it implicitlycarries out load balancing among servers to im...
متن کاملA Metadata-Rich File System
Despite continual improvements in the performance and reliability of large scale file systems, the management of file system metadata has changed little in the past decade. The mismatch between the size and complexity of large scale data stores and their ability to organize and query their metadata has led to a de facto standard in which raw data is stored in traditional file systems, while rel...
متن کاملDesign and Implementation of a Metadata-Rich File System
Despite continual improvements in the performance and reliability of large scale file systems, the management of user-defined file system metadata has changed little in the past decade. The mismatch between the size and complexity of large scale data stores and their ability to organize and query their metadata has led to a de facto standard in which raw data is stored in traditional file syste...
متن کاملOgmios: a scalable NLP platform for annotating large web document collections
Search engines like Google or Yahoo offer access to billions of textual web pages. These tools are very popular and seem to be sufficient for a large number of general user queries on the Internet. However, some other queries are more complex, requiring specific knowledge or processing strategies: no really satisfactory solution exists for these requests. There is thus a need for more specific ...
متن کامل